A NOTE ON SHANNON ENTROPY 
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Abstract. We present a somewhat different way of looking on Shannon en- 
tropy. This leads to an axiomatisation of Shannon entropy that is essentially 
equivalent to that of Fadeev. 



1. Introduction 

A large part of the discrete theory of information concerns itself with real func- 
tions H defined on the family of sequences (pi,...,p n ) such that pi > and 
Y_ Pi — 1 • There, a very significant role is played by the Shannon entropy, given 

by 

H(pi , . . . ,p n ) = pi log ^ + . . . + p n log ^0 

It is the only symmetric (i.e. independent of the order of pi-s) continuous function 
of such sequences that is normalised by H(1/2, 1/2) = 1 and satisfies the following 
grouping axiom 

(1) H( aipl,...,aTcp1 , b 1 p 2 ,...,b l p 2 , ••• , Clp n ,...,C m p n ) = 

H(pi,P2,"-- >Pn) + 

piH(ai,...,ait) +p2H(bi,...,bi] H hpn.H(ci,...,c m ). 

This result which is a better version of Shannon's own set of axioms (see [11]) is a 
slight modification of Fadeev's axioms of entropy, c.f. [7]. 

The shape of the grouping axiom, leads us to think about entropy as a value 
assigned to transformations, divisions or partitions, say from a number pi , to its 
partition dipi, .. ., a^pi, where a^ sum up to 1. In fact, we will extend H to 
nonnegative sequences, so that H(dipi , . . . , a kPi ) = Pi H(ai , . . . , ajj and satisfies 

H( cnpi,...,aicPi , bip2,...,b l p 2 , ••• , cip n ,...,c m p n ) = 

H(pi,p2,"-- ,Vn) + 

H(aipi,...,aicpi) + H(bip 2 ,...,bi.p2) H h H(cip n , • • • ,c m pn). 

whenever Ol-s, bt-s, . . ., and Ci-s sum up to 1 . 

Our approach fits in with the beautiful approach to entropy presented in [2] a 
bit more naturally than the originally used Fadeev's system of axioms. 

For a detailed exposition of Shannon entropy, related entropies and the various 
conditions related with their definition, see [1] . For a modern survey of characteri- 
sations of Shannon entropy (among other things), see [3|. 
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x In here, as throughout the paper, we confine ourselves to base 2 logarithms, as dictated by 
information theory tradition. 
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2. Entropy as a homogenous quantity, additive on partitionings of 

positive numbers. 

Each function H on sequences (pi , . . . ,p n ), Pi _ 0, Y. Vi — 1 can be naturally 
extended to a homogcnouqj function H on seqences (ai , . . . , o n ), a^ > 0, Y. a i > 
by setting 

ft(ai,...,a n ) =sH(^ L ,...,^), where s = ai +... + a n . 

Conversely every function H on sequences of nonnegative reals can be restricted to 
the domain of H. This lead to a bijective identification of H and a homogenous H. 

The function H satisfies the entropy equation if and only if H satisfies the follow- 
ing equation, closely related to 2-cocycle equation (c.f. [5], and Remark 1 in (T2"];) 

(2) ft( ai , . . . , a k , bi , . . . , bi , • • • , ci , . . . , c m ) = 

A(a, + ... + a k ,bi +... + b l ,--- ,ct +... + c m ) 
+ ft(ai , . . . , aic) + A(bi , . . . , b0 + • ■ • + A(ci , . . . , c m ). 

if we interpret H(ai , . . . , a k ) as the 'entropy' of the partitioning of ai + . . . + ak 
into ai , . . . , ak, this equation expresses a kind of additivity, that the entropy of 
the partitioning of ai + . . . + a^ + bi + . . . + bj + • • • + Ci + . . . + c m into 
ai , . . . , Ok , bi , . . . , bi , • ■ • , Ci , . . . , c m is a sum of 'entropies' of the half-way 
partitionings that go through the groups ai +. . .H-aic, bi +. . .+bx, • • ■ , Ci +. . . + c m . 

It is rather expected that a symmetric function H (not necessarily homogenous) 
satisfies the 2-cocycle equation if and only if there exists a 'potential' function 
g : [0, oo) — > R such that (see Lemma 1 in [12] ) 

(3) ft(ai,...,a n ) = g(a) +... + g(a n ) - g(ai + ...+ a n ). 

Moreover we can, and we will assume that g(1 ) = 0. Since H(1/2, 1/2) = 1 we have 
9(1/2) = 1/2. 

Now, H is homogenous if and only if H(-, •) is homogenous. This in turn is 
equivalent to 

(4) g(a(bi +b 2 ))-g(abi)-g(ab 2 )-a[g(bi +b 2 ) - g(bi) - g(b 2 )] 
Let D be a function on pairs of nonnegative numbers such that 

(5) g(ab) = ag(b)+bg(a) + D(a,b). 



It follows that g satisfies equation (4) if and only if D(a, •) = D(-, a) is additive i.e. 
if and only if D is Q + -bilinear. 

Assume now that H is derived from H i.e. that H is homogenous. Then D is 
Q + -bilincar. Since g(l) =0wc conclude that 

(6) g(ub) = ag(b) + bg(a) for nonnegative rational a, and b. 

Now, let us remark that if we could prove that g is continuous it would follow 



that equation (6) is satified for all nonnegative a, b. Then the function I such that 
g(a) = al(a) would be continuous and would satisfy the equation 

(7) l(ab)= 1(a) + 1(b), 



2 i.e. such that H(cai , . . . , CQ n ) = cH(ai , . . . , Q n ), for c > 0. 
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and therefore would be a logarithm, I(x) — c • lnx. Since 1(1/2) = 1 , we would have 
l(x) — log(1/x), and g(x) = xlog(1/x) i.e. we would show that H is a Shannon 
entropy. 

Assume that H is continuous, or equivalently that H is continuous. The proof 
that g is continuous, or that I is a logarithm is the crucial part of the reasoning. The 
author of this paper cannot find an easier way, or for that matter any significantly 
different proof, than the one that makes use of the following theorem (c.f. [9], Ch 
IX,Theorem 2, p. 544) 

Theorem 1. Let I : {1 , 2, ...}—> R be a function that satisfies the following condi- 
tions 

(8) l(ab) = 1(a) + 1(b) 

(9) l(n+1) — l(n) — > 0, as n tends to oo . 
Then l(n) = c • Inn. 

By using this theorem our reasoning converges with the proof (as given in [5], 
Ch IX) that the Fadccv axiomatisation uniquely describes Shannon entropy. In 
fact, from the continuity of H we have 

l(n+1) 



l(n+1)-l(n) 



g(^)-g0)-g(~)-^ 



which is almost (jH]). The only element missing is supplied by the following elemen- 
tary result due to Mercer (see jS], or [§]) 

Theorem 2. Let a n be a sequence of numbers, and let s n = Qi + . . . + a n &e its 
partial sum. Then 



a if and only if a n -\ > 2a. 



We just need to use it with a n = l(n+1) — l(n), and a — 0. We infer that 1(a) = 
log(1/a), and g(a) = alog(1/a) for rational a > 0. Denote u(a) := alog(1/a) for 



all a > 0. Then from equation (3) for rational a^ and the continuity of H and u we 
infer that 

ft(a 1 ,...,a n ) =u(ai) + ...+u(a n ) -u(ai +... + a n ). 

This concludes our reasoning. Gathering it all together, we have shown the 
following theorem, essentially equivalent to the Fadeev axiomatization of entropy: 

Theorem 3. Let H be a function that satisfies the following conditions: 

(1) H is homogenous, 

(2) H is symmetric, 

(3) H satisfies the 2-cocycle equation ([2]), 

(4) H is continuous, 

(5) A(1,1)=2. 

Then H(ai , . . . , a n ) = u(ai ) + ...+ u(a n ) — u(ai + . . . + a n ), where u(x) — 
xlog(1/x). 

These notes are the expression of my unwritten view of Shannon entropy that I 
held for a few years. This view was recalled on my recent publications of [TU] and 
[T2] . and in particular on stumbling upon [2]. 
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